Comrad: detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data

نویسندگان

  • Andrew McPherson
  • Chunxiao Wu
  • Iman Hajirasouliha
  • Fereydoun Hormozdiari
  • Faraz Hach
  • Anna Lapuk
  • Stanislav Volik
  • Sohrab P. Shah
  • Colin Collins
  • Süleyman Cenk Sahinalp
چکیده

MOTIVATION Comrad is a novel algorithmic framework for the integrated analysis of RNA-Seq and whole genome shotgun sequencing (WGSS) data for the purposes of discovering genomic rearrangements and aberrant transcripts. The Comrad framework leverages the advantages of both RNA-Seq and WGSS data, providing accurate classification of rearrangements as expressed or not expressed and accurate classification of the genomic or non-genomic origin of aberrant transcripts. A major benefit of Comrad is its ability to accurately identify aberrant transcripts and associated rearrangements using low coverage genome data. As a result, a Comrad analysis can be performed at a cost comparable to that of two RNA-Seq experiments, significantly lower than an analysis requiring high coverage genome data. RESULTS We have applied Comrad to the discovery of gene fusions and read-throughs in prostate cancer cell line C4-2, a derivative of the LNCaP cell line with androgen-independent characteristics. As a proof of concept, we have rediscovered in the C4-2 data 4 of the 6 fusions previously identified in LNCaP. We also identified six novel fusion transcripts and associated genomic breakpoints, and verified their existence in LNCaP, suggesting that Comrad may be more sensitive than previous methods that have been applied to fusion discovery in LNCaP. We show that many of the gene fusions discovered using Comrad would be difficult to identify using currently available techniques. AVAILABILITY A C++ and Perl implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comrad: a novel algorithmic framework for the integrated analysis of RNA-Seq and WGSS data

Motivation: Comrad is a novel algorithmic framework for the integrated analysis of RNA-Seq and Whole Genome Shotgun Sequencing (WGSS) data for the purposes of discovering genomic rearrangements and aberrant transcripts. The Comrad framework leverages the advantages of both RNA-Seq and WGSS data, providing accurate classification of rearrangements as expressed or not expressed and accurate class...

متن کامل

nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing Supplementary Text

Supplemental nFuse pipeline overview The nFuse method builds upon Comrad (McPherson et al., 2011b), our previous work on rearrangement detection in matched RNA-seq and WGSS. We begin this section by briefly describing Comrad, then describe significant differences between Comrad and nFuse. An overview of the nFuse pipeline is shown in Figure 1.

متن کامل

Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data

Next-generation RNA sequencing (RNA-seq) maps and analyzes transcriptomes and generates data on sequence variation in expressed genes. There are few reported studies on analysis strategies to maximize the yield of quality RNA-seq SNP data. We evaluated the performance of different SNP-calling methods following alignment to both genome and transcriptome by applying them to RNA-seq data from a Ha...

متن کامل

Investigating the Function of Predicted Proteins from RNA-Seq Data in Holstein and Cholistani Cattle Breeds

This study was performed to determine the digital expression profile of different genes expressed in Holstein and Cholistani breeds as well as to evaluate the performance of predicted proteins derived from differentially expressed genes between these two breeds using RNA-Seq data. For this purpose, the whole mRNA sequence for a blood sample of American Holstein and Pakistani Cholistani cattle p...

متن کامل

Integrated RNA and DNA sequencing improves mutation detection in low purity tumors

Identifying somatic mutations is critical for cancer genome characterization and for prioritizing patient treatment. DNA whole exome sequencing (DNA-WES) is currently the most popular technology; however, this yields low sensitivity in low purity tumors. RNA sequencing (RNA-seq) covers the expressed exome with depth proportional to expression. We hypothesized that integrating DNA-WES and RNA-se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 27 11  شماره 

صفحات  -

تاریخ انتشار 2011